Explaining Black-Box Models through Counterfactuals

JuliaCon 2022

Patrick Altmeyer

using Pkg; Pkg.activate("dev")
using PlotThemes
theme(:wong)
include("dev/utils.jl")
www_path = "dev/resources/juliacon22/www"

Overview

  • The Problem with Black Boxes ⬛
    • What are black-box models? Why do we need explainability?
  • Enter: Counterfactual Explanations 🔮
    • What are counterfactuals? What are they not?
  • CounterfactualExplanations.jl in Julia (and beyond!) 📦
    • Why do we need this package?
    • Package architecture
    • Usage examples - what can it do?
  • Goals and Ambitions 🎯
    • Future developments - where can it go?
    • Contributor’s guide

The Problem with Black Boxes ⬛

Of Short Lists, Pandas and Gibbons

  • From human to data-driven decision-making:
    • Black-box models like deep neural networks are being deployed virtually everywhere.
    • More likely than not that your loan or employment application is handled by an algorithm.
    • Includes critical domains: health care, autonomous driving, finance, …

A recipe for disaster …

  • We have no idea what exactly we’re cooking up …
    • Have you received an automated rejection email? Why didn’t you “mEet tHe sHoRtLisTiNg cRiTeRia”? 🙃
  • … but we do know that some of it is junk.

“Weapons of Math Destruction”

  • If left unchallenged, these properties of black-box models can create undesirable dynamics:
    • Human operators in charge of the system have to rely on it blindly.
    • Those individuals subject to it generally have no way to challenge an outcome.

“You cannot appeal to (algorithms). They do not listen. Nor do they bend.”

— Cathy O’Neil in Weapons of Math Destruction, 2016

Figure 2: Cathy O’Neil. Source: Cathy O’Neil

Towards Trustworthy AI

Data

Probabilistic Models

Counterfactual Reasoning

Towards Trustworthy AI

Data

Probabilistic Models

Counterfactual Reasoning

Model objective: maximize \(p(\mathcal{D}|\theta)\) where \(\mathcal{D}=\{(x,y)\}_{i=1}^n\) (supervised)

  • In an ideal world we can rely on parsimonious and interpretable models.
  • In practice these models have their limitation.
  • Black-box models like deep neural network are the very opposite of parsimonious.

[…] deep neural networks are typically very underspecified by the available data, and […] parameters [therefore] correspond to a diverse variety of compelling explanations for the data. (Wilson 2020)

  • In this setting it is often crucial to treat models probabilistically!
  • Probabilistic models covered briefly today. More in my other talk …

Towards Trustworthy AI

Data

Probabilistic Models

Counterfactual Reasoning

  • Counterfactual reasoning boils down to simple questions: what if \(x \Rightarrow x\prime\)?
  • By (strategically) perturbing features and checking the model output, we can (begin to) understand how the model makes its decisions.

Even though […] interpretability is of great importance and should be pursued, explanations can, in principle, be offered without opening the “black box”. (Wachter, Mittelstadt, and Russell 2017)

Enter: Counterfactual Explanations 🔮

A Framework for Counterfactual Explanations

  • Objective originally proposed by Wachter, Mittelstadt, and Russell (2017) is as follows where \(h\) relates to the complexity of the counterfactual and \(M\) denotes the classifier:

\[ \min_{x\prime \in \mathcal{X}} h(x\prime) \ \ \ \mbox{s. t.} \ \ \ M(x\prime) = t \qquad(1)\]

  • Typically approximated through regularization:

\[ x\prime = \arg \min_{x\prime} \ell(M(x\prime),t) + \lambda h(x\prime) \qquad(2)\]

Toy example

A simple counterfactual path from 🐱 to 🐶

We have fitted some black-box classifier to divide cats and dogs. One 🐱 is friends with a lot of cool 🐶 and wants to remain part of that group. The counterfactual path below shows her how to fool the classifier:

Counterfactuals … as in Adversarial Examples?

Yes and no!

While both are methodologically very similar, adversarial examples are meant to go undetected while CEs ought to be meaningful.

  • Effective counterfactuals should meet certain criteria ✅

  • closeness: the average distance between factual and counterfactual features should be small (Wachter, Mittelstadt, and Russell (2017))

  • actionability: the proposed feature perturbation should actually be actionable (Ustun, Spangher, and Liu (2019), Poyiadzi et al. (2020))

  • plausibility: the counterfactual explanation should be plausible to a human (Joshi et al. (2019))

  • unambiguity: a human should have no trouble assigning a label to the counterfactual (Schut et al. (2021))

  • sparsity: the counterfactual explanation should involve as few individual feature changes as possible (Schut et al. (2021))

  • robustness: the counterfactual explanation should be robust to domain and model shifts (Upadhyay, Joshi, and Lakkaraju (2021))

  • diversity: ideally multiple diverse counterfactual explanations should be provided (Mothilal, Sharma, and Tan (2020))

  • causality: counterfactual explanations reflect the structural causal model underlying the data generating process (Karimi et al. (2020), Karimi, Schölkopf, and Valera (2021))

Counterfactuals … as in Causal Inference?

NO!

  • In causal inference (Potential Outcome Framework), counterfactuals are unobserved states of the world that we would like to observe in order to establish causality.
    • For example, in drug trials we would love to observe the actual outcome under treatment (\(y_i|T=1\)) and non-treatment (\(y_i|T=0\)) for the same individual.
    • Instead we estimate an average treatment effect as:
  • CE involves perturbing features after some model has been trained.
    • We end up comparing modeled outcomes \(f(x_i)\) and \(f(x_i\prime)\).
  • The two are NOT the same!

Well, maybe …

There is nonetheless an intriguing link between the two domains:

  • If we do have causal knowledge, let’s leverage it: from minimal perturbations to minimal interventions (Karimi et al. (2020), Karimi, Schölkopf, and Valera (2021))
  • If CEs that rely on minimal interventions fail, does that not provide some evidence that the assumed causal graph is inaccurate?
  • Very much a open research field …

Probabilistic Methods for Counterfactual Explanations

When people say that counterfactuals should look realistic or plausible, they really mean that counterfactuals should be generated by the same Data Generating Process (DGP) as the factuals:

\[ x\prime \sim p(x) \]

  • How do we estimate \(p(x)\)? Two probabilistic approaches …
  • Schut et al. (2021) note that by maximizing predictive probabilities \(\sigma(M(x\prime))\) for probabilistic models \(M\in\mathcal{\widetilde{M}}\) one implicitly minimizes epistemic and aleotoric uncertainty.
  • Instead of perturbing samples directly, some have proposed to instead traverse a lower-dimensional latent embedding learned through a generative model.

Counterfactual Explanations in Julia (and beyond!)

Limited Software Availability

  • Some of the existing approaches scattered across different GitHub repositories (🐍).
  • Only one unifying Python 🐍 library: CARLA (Pawelczyk et al. 2021).
    • Comprehensive and (somewhat) extensible …
    • … but not language-agnostic and some desirable functionality not supported.
    • Also not composable: each generator is treated as different class/entity.
  • Both R and Julia lacking any kind of implementation. Until now …

Enter: CounterfactualExplanations.jl 📦

Dev Build Status Coverage

  • A unifying framework for generating Counterfactual Explanations.
  • Built in Julia, but essentially language agnostic:
    • Currently supporting explanations for differentiable models built in Julia (e.g. Flux) and torch (R and Python).
  • Designed to be easily extensible through dispatch.
  • Designed to be composable allowing users and developers to combine different counterfactual generators.

Julia has an edge with respect to Trustworthy AI: it’s open-source, uniquely transparent and interoperable 🔴🟢🟣

Package Architecture

Modular, composable, scalable!

Overview

Figure 4: Overview of package architecture. Modules are shown in red, structs in green and functions in blue.

Generators

using CounterfactualExplanations, Plots, GraphRecipes
plt = plot(AbstractGenerator, method=:tree, fontsize=10, nodeshape=:rect, size=(1000,700))
savefig(plt, joinpath(www_path,"generators.png"))

Figure 5: Type tree for AbstractGenerator.

Models

plt = plot(AbstractFittedModel, method=:tree, fontsize=10, nodeshape=:rect, size=(1000,700))
savefig(plt, joinpath(www_path,"models.png"))

Figure 6: Type tree for AbstractFittedModel.

Basic Usage

A simple example

  1. Load and prepare some toy data.
  2. Select a random sample.
  3. Generate counterfactuals using different approaches.
# Data:
using CounterfactualExplanations.Data
N = 100
xs, ys = Data.toy_data_linear(N)
X = hcat(xs...)
counterfactual_data = CounterfactualData(X,ys')

# Randomly selected factual:
x = select_factual(counterfactual_data,rand(1:size(X)[2]))
y = round(probs(M, x)[1])
target = ifelse(y==1.0,0.0,1.0) # opposite label as target

Generic Generator

Code

# Model
using CounterfactualExplanations.Models: LogisticModel
w = [1.0 1.0] # estimated coefficients
b = 0 # estimated bias
M = LogisticModel(w, [b])

# Counterfactual search:
generator = GenericGenerator()
counterfactual = generate_counterfactual(
  x, target, counterfactual_data, M, generator
)

Output

Greedy Generator

Code

using LinearAlgebra
Σ = Symmetric(reshape(randn(9),3,3).*0.01 + UniformScaling(1)) # MAP covariance matrix
μ = hcat(b, w)
M = CounterfactualExplanations.Models.BayesianLogisticModel(μ, Σ)

# Counterfactual search:
generator = GreedyGenerator()
counterfactual = generate_counterfactual(
  x, target, counterfactual_data, M, generator
)

Output

REVISE Generator

Code

# Counterfactual search:
generator = REVISEGenerator()
counterfactual = generate_counterfactual(
  x, target, counterfactual_data, M, generator
)
# Counterfactual search:
generator = GenericGenerator()
counterfactual = generate_counterfactual(
  x, target, counterfactual_data, M, generator,
  latent_space=true
)

Output

Customization

Custom Models - Deep Ensemble

Loading the pre-trained deep ensemble …

ensemble = mnist_ensemble() # deep ensemble

Step 1: add composite type as subtype of AbstractFittedModel.

struct FittedEnsemble <: Models.AbstractFittedModel
    ensemble::AbstractArray
end

Step 2: dispatch logits and probs methods for new model type.

using Statistics
import CounterfactualExplanations.Models: logits, probs
logits(M::FittedEnsemble, X::AbstractArray) = mean(Flux.stack([nn(X) for nn in M.ensemble],3), dims=3)
probs(M::FittedEnsemble, X::AbstractArray) = mean(Flux.stack([softmax(nn(X)) for nn in M.ensemble],3),dims=3)
M = FittedEnsemble(ensemble)

Results for a simple deep ensemble also look convincing!

Custom Models - Interoperability

Adding support for torch models was easy! Here’s how I implemented it for torch classifiers trained in R.

Source code

Step 1: add composite type as subtype of AbstractFittedModel

Done here.

Step 2: dispatch logits and probs methods for new model type.

Done here.

Step 3: add gradient access.

Done here.

Unchanged API

using RCall
synthetic = load_synthetic([:r_torch])
model = synthetic[:classification_binary][:models][:r_torch][:raw_model]
M = RTorchModel(model)
# Define generator:
generator = GenericGenerator()
# Generate recourse:
counterfactual = generate_counterfactual(
  x, target, counterfactual_data, M, generator
)

Custom Generators

Idea 💡: let’s implement a generic generator with dropout!

Dispatch

Step 1: create a subtype of AbstractGradientBasedGenerator (adhering to some basic rules).

# Constructor:
struct DropoutGenerator <: AbstractGradientBasedGenerator
    loss::Symbol # loss function
    complexity::Function # complexity function
    mutability::Union{Nothing,Vector{Symbol}} # mutibility constraints 
    λ::AbstractFloat # strength of penalty
    ϵ::AbstractFloat # step size
    τ::AbstractFloat # tolerance for convergence
    p_dropout::AbstractFloat # dropout rate
end

Step 2: implement logic for generating perturbations.

import CounterfactualExplanations.Generators: generate_perturbations, ∇
using StatsBase
function generate_perturbations(generator::AbstractDropoutGenerator, counterfactual_state::CounterfactualState)
    𝐠ₜ = (generator, counterfactual_state) # gradient
    # Dropout:
    set_to_zero = sample(1:length(𝐠ₜ),Int(round(generator.p_dropout*length(𝐠ₜ))),replace=false)
    𝐠ₜ[set_to_zero] .= 0
    Δx′ = - (generator.ϵ .* 𝐠ₜ) # gradient step
    return Δx′
end

Unchanged API

# Instantiate:
using LinearAlgebra
generator = DropoutGenerator(
    :logitbinarycrossentropy,
    norm,
    nothing,
    0.1,
    0.1,
    1e-5,
    0.5
)
counterfactual = generate_counterfactual(
  x, target, counterfactual_data, M, generator
)

Goals and Ambitions 🎯

JuliaCon 2022 and beyond

More Resources

Hidden

Explainable AI (XAI)

  • interpretable = inherently interpretable model, no extra tools needed (GLM, decision trees, rules, …) (Rudin 2019)
  • explainable = inherently not interpretable model, but explainable through XAI

Post-hoc Explainability:

  • Local surrogate explainers like LIME and Shapley: useful and popular, but …
    • … can be easily fooled (Slack et al. 2020)
    • … rely on reasonably interpretable features.
    • … rely on the concept of fidelity.
  • Counterfactual explanations explain how inputs into a system need to change for it to produce different decisions.
    • Always full-fidelity, since no proxy involved.
    • Intuitive interpretation and straight-forward implemenation.
    • Works well with Bayesian models. Clear link to Causal Inference.
    • Does not rely on interpretable features.
  • Realistic and actionable changes can be used for the purpose of algorithmic recourse.

Feature Constraints

Mutability constraints can be added at the preprocessing stage:

counterfactual_data = CounterfactualData(X,ys';domain=[(-Inf,Inf),(-Inf,-0.5)])

Research Topics (1) - Student Project

What happens once AR has actually been implemented? 👀

Research Topics (2)

  • An effortless way to incorporate model uncertainty (w/o need for expensive generative model): Laplace Redux.
  • Counterfactual explanations for time series data.
  • Is CE really more intuitive? Could run a user-based study like in Kaur et al. (2020).
  • More ideas form your side? 🤗

References

Antorán, Javier, Umang Bhatt, Tameem Adel, Adrian Weller, and José Miguel Hernández-Lobato. 2020. “Getting a Clue: A Method for Explaining Uncertainty Estimates.” arXiv Preprint arXiv:2006.06848.
Goodfellow, Ian J, Jonathon Shlens, and Christian Szegedy. 2014. “Explaining and Harnessing Adversarial Examples.” arXiv Preprint arXiv:1412.6572.
Joshi, Shalmali, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. 2019. “Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems.” arXiv Preprint arXiv:1907.09615.
Karimi, Amir-Hossein, Bernhard Schölkopf, and Isabel Valera. 2021. “Algorithmic Recourse: From Counterfactual Explanations to Interventions.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 353–62.
Karimi, Amir-Hossein, Julius Von Kügelgen, Bernhard Schölkopf, and Isabel Valera. 2020. “Algorithmic Recourse Under Imperfect Causal Knowledge: A Probabilistic Approach.” arXiv Preprint arXiv:2006.06831.
Kaur, Harmanpreet, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. “Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning.” In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–14.
Mothilal, Ramaravind K, Amit Sharma, and Chenhao Tan. 2020. “Explaining Machine Learning Classifiers Through Diverse Counterfactual Explanations.” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 607–17.
Pawelczyk, Martin, Sascha Bielawski, Johannes van den Heuvel, Tobias Richter, and Gjergji Kasneci. 2021. “Carla: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms.” arXiv Preprint arXiv:2108.00783.
Poyiadzi, Rafael, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. “FACE: Feasible and Actionable Counterfactual Explanations.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 344–50.
Rudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1 (5): 206–15.
Schut, Lisa, Oscar Key, Rory Mc Grath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. “Generating Interpretable Counterfactual Explanations by Implicit Minimisation of Epistemic and Aleatoric Uncertainties.” In International Conference on Artificial Intelligence and Statistics, 1756–64. PMLR.
Slack, Dylan, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. 2020. “Fooling Lime and Shap: Adversarial Attacks on Post Hoc Explanation Methods.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 180–86.
Upadhyay, Sohini, Shalmali Joshi, and Himabindu Lakkaraju. 2021. “Towards Robust and Reliable Algorithmic Recourse.” arXiv Preprint arXiv:2102.13620.
Ustun, Berk, Alexander Spangher, and Yang Liu. 2019. “Actionable Recourse in Linear Classification.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 10–19.
Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841.
Wilson, Andrew Gordon. 2020. “The Case for Bayesian Deep Learning.” arXiv Preprint arXiv:2001.10995.